Using Hearst's Rules for the Automatic Acquisition of Hyponyms for Mining a Pharmaceutical Corpus
نویسنده
چکیده
Fully Automatic Thesaurus Generation (ATG) seeks to generate useful thesauri by mining a corpus of raw text. A number of statistical approaches, based on term co occurrence, exist for this, but in general they are only able to estimate the strength of the relationship between two terms, not its nature. In this paper we implement Hearst's method of discovering the hyponymy relations which are the building blocks of hierarchical thesauri. We start with the Scrip corpus of newsfeeds in the domain of psychology, and were able to discover an estimated 400 useful term relationships. A domainspecific thesaurus such as MeSH (MEDLINE) or the Derwent Drug File (DDF) gives an overview of the extent of the domain, and the categories, relations and named entities within it. They typically consist of lists of terms organised according to a semantic hierarchy. Electronic thesauri are used in document retrieval or indexing systems, for expanding queries when searching for information or the selection of a preferred form of a given search term. Experiments such as the Worm Community System have shown that the thesaurus is an excellent memoryjogging device which supports learning and serendipitous browsing. Thesauri prevent users from becoming overwhelmed by the sheer amount of available information, and the " classical vocabulary problem, which results from the diversity of expertise and backgrounds of systems users " (Chen et al., 95). Although a number of successful commerciallyavailable thesauri created by large teams of human experts are available, in general manual thesaurus generation is prohibitively costly. Grefenstette (94) writes that the ideal might be to use knowledge poor approaches, starting from just the raw corpus " if the ultimate goal of ATG (Automatic Thesaurus Generation) is the deduction of semantic relationships exclusively from free text corpora ". ATG is thus an example of knowledge discovery in text databases or text data mining. Most existing methods for automatic thesaurus generation are statistical, and rely on the cooccurrence of a pair of terms within a common " window " of text, which may be a fixed number of words, within the same syntactic clause, or within a common document in a large collection of documents. Details of such approaches were given first by Salton in 1989, and more recently by Pereira et al. (93) and Kageura et al. (00). For each word pair in the corpus vocabulary, such methods are able to generate a numeric score to …
منابع مشابه
ساخت نیمهخودکار یک پیکره از نظرات غیرمستقیم در دامنه دارو و بکارگیری آن برای تعیین قطبیت نظرات
Opinion mining is a well-known problem in natural language processing that has attracted increasing attention in recent years. Existing approaches have been often focused on identifying direct opinions and ignored indirect ones. However, in some domains such as medical, indirect opinions occur frequently. Therefore, ignoring indirect opinions can lead to the loss of valuable information and not...
متن کاملAllophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملAutomatic Acquisition of Hyponyms and Meronyms from Question Corpora
We explore how lexical and ontological relations can be acquired automatically from natural language questions. The focus in this paper is on identifying hyponym and meronym relations by using simple pattern matching. It is shown that natural language questions can provide a significant source for ontological information.
متن کاملAcquisition of Hypernyms and Hyponyms from the WWW
Recently research in automatic ontology construction has become a hot topic, because of the vision that ontology will be the core component to realize the semantic web. This paper presents a method to automatically construct ontology by mining the web. We introduce an algorithm to automatically acquire hypernyms and hyponyms for any given lexical term using search engine and natural language pr...
متن کاملExplain the theoretical and practical model of automatic facade design intelligence in the process of implementing the rules and regulations of facade design and drawing
Artificial intelligence has been trying for decades to create systems with human capabilities, including human-like learning; Therefore, the purpose of this study is to discover how to use this field in the process of learning facade design, specifically learning the rules and standards and national regulations related to the design of facades of residential buildings by machine with a machine ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005